The human heart faces significant impacts from factors such as age and sex. Current research indicates that males are more susceptible to heart disease than females (Weidner 2000), with the risk increasing with age for both genders (Costantino, Paneni, and Cosentino 2016). To deepen our understanding of these trends, this study investigates whether epigenetic modifications correlate with observed differences in heart disease susceptibility.
Epigenetic modifications, particularly histone modifications, play a critical role in regulating gene expression without altering the DNA sequence. These changes can influence heart development, function, and disease susceptibility (Soler-Botija, Gálvez-Montón, and Bayés-Genís 2019). By studying epigenetic modifications, researchers can uncover mechanisms that traditional genetic studies might miss, providing a deeper understanding of how environmental and lifestyle factors impact heart health. This can lead to more targeted and effective treatments and preventive strategies for cardiovascular diseases.
This study specifically examines the roles of histone modifications H3K4me1 (histone H3 lysine 4 monomethylation), H3K9me3 (histone H3 lysine 9 trimethylation), and H3K27ac (histone H3 lysine 27 acetylation) in gene activation and repression (Hyun et al. 2017), along with using ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) to assess chromatin accessibility. Furthermore, the co-binding patterns of these histone modifications with CTCF (CCCTC-binding factor), a key transcriptional regulator involved in chromatin organization (Holwerda and Laat 2013), are explored to provide a comprehensive understanding of their regulatory roles.
This study will consist of four different analyses each dedicated to specific aspects of epigenetic modifications:
1. Differential Peak Calling Analysis – Examining differences in chromatin states across demographic groups.
2. Clustering and Visualization Analysis – Visualizing and identifying patterns of enhancer activity.
3. Gene Association Analysis – Exploring associations between epigenetic modifications and gene expression.
4. Co-Binding Analysis – Investigating co-binding interactions within regulatory networks.
suppressPackageStartupMessages({
library(AnnotationHub)
library(ensembldb)
library(GenomicRanges)
library(ggplot2)
library(motifmatchr)
library(Biostrings)
library(MotifDb)
library(TFBSTools)
library(universalmotif)
library(PWMEnrich)
library(rtracklayer)
library(biomaRt)
library(EnrichedHeatmap)
library(GenomicAlignments)
library(edgeR)
library(epiwraps)
library(rGREAT)
library(knitr)
library(kableExtra)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(ChIPseeker)
library(org.Hs.eg.db)
library(limma)
library(DESeq2)
library(csaw)
library(sechm)
library(GenomicFeatures)
library(GenomicInteractions)
library(clusterProfiler)
library(BSgenome.Hsapiens.UCSC.hg38)
library(DiffBind)
library(bookdown)
})
All data for this study was sourced from the ENCODE database, focusing on the left ventricle of the heart, which pumps oxygenated blood to the body. The demographic groups analyzed include old male (73 years), young male (34 years), old female (59 years), and young female (46 years).
Our variables are labeled as short but descriptive as possible containing following information:
- Variable of interest
- Type of data, e.g. peaks
- group, e.g. old male (om), young female (yf)
dir.create("data")
### Old male (73y)
# ATAC: <https://www.encodeproject.org/experiments/ENCSR769DGC/>\
download.file("https://www.encodeproject.org/files/ENCFF743LDX/@@download/ENCFF743LDX.bed.gz",
destfile = "data/ATACpeak_om") #o = old, m = male
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR461MTO/>\
download.file("https://www.encodeproject.org/files/ENCFF735CIX/@@download/ENCFF735CIX.bed.gz",
destfile = "data/H3K4me1peak_om") #o = old, m = male
# H3K9me3: <https://www.encodeproject.org/experiments/ENCSR843ETK/>\
download.file("https://www.encodeproject.org/files/ENCFF944MPR/@@download/ENCFF944MPR.bed.gz",
destfile = "data/H3K9me3peak_om") #o = old, m = male
# H3K27ac: <https://www.encodeproject.org/experiments/ENCSR402JWL/>\
download.file("https://www.encodeproject.org/files/ENCFF610WJV/@@download/ENCFF610WJV.bed.gz",
destfile = "data/H3K27acpeak_om") #o = old, m = male
# bam files for differential analysis
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR461MTO/>\
options(timeout=3600)
download.file("https://www.encodeproject.org/files/ENCFF230EIS/@@download/ENCFF230EIS.bam",
destfile = "data/H3K4me1bam_om") #o = old, m = male
H3K4me1bam_om <- BamFile("data/H3K4me1bam_om")
indexBam(H3K4me1bam_om)
options(timeout=3600)
# RNA-seq data: <https://www.encodeproject.org/experiments/ENCSR131FDP/>\
download.file("https://www.encodeproject.org/files/ENCFF417VDK/@@download/ENCFF417VDK.tsv",
destfile = "data/RNA_om") #o = old, m = male (69y)
RNA_om <- read.table("data/RNA_om", header = TRUE)
options(timeout=3600)
# CTCF: <https://www.encodeproject.org/experiments/ENCSR578AKX/>\
download.file("https://www.encodeproject.org/files/ENCFF182MYP/@@download/ENCFF182MYP.bed.gz",
destfile = "data/CTCF_om") #o = old, m = male (73y)
CTCF_om <- read.table("data/CTCF_om", header = FALSE)
CTCF_om <- GRanges(seqnames = CTCF_om$V1,
ranges = IRanges(start = CTCF_om$V2, end = CTCF_om$V3))
#________________________________________________________________________________________
### Young male (34y)
# ATAC (43y): <https://www.encodeproject.org/experiments/ENCSR310RJN/>\
download.file("https://www.encodeproject.org/files/ENCFF568NPG/@@download/ENCFF568NPG.bed.gz",
destfile = "data/ATACpeak_ym") #y = young, m = male
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR111WGZ/>\
download.file("https://www.encodeproject.org/files/ENCFF569RXT/@@download/ENCFF569RXT.bed.gz",
destfile = "data/H3K4me1peak_ym") #y = young, m = male
# H3K9me3: <https://www.encodeproject.org/experiments/ENCSR176KNR/>\
download.file("https://www.encodeproject.org/files/ENCFF173FHK/@@download/ENCFF173FHK.bed.gz",
destfile = "data/H3K9me3peak_ym") #y = young, m = male
# H3K27ac: <https://www.encodeproject.org/experiments/ENCSR150QXE/>\
download.file("https://www.encodeproject.org/files/ENCFF094TOI/@@download/ENCFF094TOI.bed.gz",
destfile = "data/H3K27acpeak_ym") #y = young, m = male
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR111WGZ/>\
options(timeout=3600)
download.file("https://www.encodeproject.org/files/ENCFF018JTV/@@download/ENCFF018JTV.bam",
destfile = "data/H3K4me1bam_ym") #y = young, m = male
H3K4me1bam_ym <- BamFile("data/H3K4me1bam_ym")
indexBam(H3K4me1bam_ym)
# RNA-seq data: <https://www.encodeproject.org/experiments/ENCSR924MSZ/>\
download.file("https://www.encodeproject.org/files/ENCFF311HGE/@@download/ENCFF311HGE.tsv",
destfile = "data/RNA_ym") #y = young, m = male (40y)
RNA_ym <- read.table("data/RNA_ym", header = TRUE)
# CTCF: <https://www.encodeproject.org/experiments/ENCSR355PMV/>\
download.file("https://www.encodeproject.org/files/ENCFF641ZTX/@@download/ENCFF641ZTX.bed.gz",
destfile = "data/CTCF_ym") #y = young, m = male (40y)
CTCF_ym <- read.table("data/CTCF_ym", header = FALSE)
CTCF_ym <- GRanges(seqnames = CTCF_ym$V1,
ranges = IRanges(start = CTCF_ym$V2, end = CTCF_ym$V3))
# CTCF .bw: <https://www.encodeproject.org/experiments/ENCSR355PMV/>\
options(timeout=6000)
download.file("https://www.encodeproject.org/files/ENCFF366NLF/@@download/ENCFF366NLF.bigWig",
destfile = "data/CTCFbw_ym") #y = young, m = male (40y)
CTCFbw_ym <- read.table("data/CTCFbw_ym", header = FALSE)
download.file("https://www.encodeproject.org/files/ENCFF366NLF/@@download/ENCFF366NLF.bigWig",
destfile = "CTCFbw_yf") #y = young, f = female (46y)
CTCFbw_yf <- read.table("CTCFbw_yf", header = FALSE)
# DELETE? CTCF.bed (43y): <https://www.encodeproject.org/experiments/ENCSR355PMV/>\
download.file("https://www.encodeproject.org/files/ENCFF641ZTX/@@download/ENCFF641ZTX.bed.gz",
destfile = "CTCFbed_ym") #y = young, m = male
CTCFbed_ym <- read.table("CTCFbed_ym", header = FALSE)
# Select the first three columns
CTCFbed_ym_subset <- CTCFbed_ym[, 1:3]
# Write the selected columns to a new file
write.table(CTCFbed_ym_subset, file = "CTCFbed_ym_subset.bed", sep = "\t",
row.names = FALSE, col.names = FALSE, quote = FALSE)
CTCFbed_ym <- GRanges(seqnames = CTCFbed_ym$V1,
ranges = IRanges(start = CTCFbed_ym$V2, end = CTCFbed_ym$V3))
save(CTCFbed_ym, file = "CTCFbed_ym.RData")
#_______________________________________________________________________________
### Old female (59y)
# ATAC: <https://www.encodeproject.org/experiments/ENCSR925LGW/>\
download.file("https://www.encodeproject.org/files/ENCFF947DKR/@@download/ENCFF947DKR.bed.gz",
destfile = "data/ATACpeak_of") #o = old, f = female
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR485LPA/>\
download.file("https://www.encodeproject.org/files/ENCFF463WJH/@@download/ENCFF463WJH.bed.gz",
destfile = "data/H3K4me1peak_of") #o = old, f = female
# H3K9me3: <https://www.encodeproject.org/experiments/ENCSR284DWQ/>\
download.file("https://www.encodeproject.org/files/ENCFF604IOR/@@download/ENCFF604IOR.bed.gz",
destfile = "data/H3K9me3peak_of") #o = old, f = female
# H3K27ac: <https://www.encodeproject.org/experiments/ENCSR884SIF/>\
download.file("https://www.encodeproject.org/files/ENCFF095OXH/@@download/ENCFF095OXH.bed.gz",
destfile = "data/H3K27acpeak_of") #o = old, f = female
# old female 59y
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR485LPA/>\
options(timeout=3600)
download.file("https://www.encodeproject.org/files/ENCFF392MXC/@@download/ENCFF392MXC.bam",
destfile = "data/H3K4me1bam_of") #o = old, f = female
H3K4me1bam_of <- BamFile("data/H3K4me1bam_of")
indexBam(H3K4me1bam_of)
# RNA-seq data: <https://www.encodeproject.org/experiments/ENCSR900FUP/>\
download.file("https://www.encodeproject.org/files/ENCFF408FCD/@@download/ENCFF408FCD.tsv",
destfile = "data/RNA_of") #o = old, f = female (59y)
RNA_of <- read.table("data/RNA_of", header = TRUE)
# CTCF: <https://www.encodeproject.org/experiments/ENCSR778ZPK/>\
download.file("https://www.encodeproject.org/files/ENCFF805CXH/@@download/ENCFF805CXH.bed.gz",
destfile = "data/CTCF_of") #o = old, f = female (59y)
CTCF_of <- read.table("data/CTCF_of", header = FALSE)
CTCF_of <- GRanges(seqnames = CTCF_of$V1,
ranges = IRanges(start = CTCF_of$V2, end = CTCF_of$V3))
#________________________________________________________________________________________
### Young female(46y)
# ATAC (47y): <https://www.encodeproject.org/experiments/ENCSR399OSE/>\
download.file("https://www.encodeproject.org/files/ENCFF056KYH/@@download/ENCFF056KYH.bed.gz",
destfile = "data/ATACpeak_47yf") #y = young, f = female
# ATAC (42y): <https://www.encodeproject.org/experiments/ENCSR593YFB/>\
download.file("https://www.encodeproject.org/files/ENCFF904QPU/@@download/ENCFF904QPU.bed.gz",
destfile = "data/ATACpeak_42yf") #y = young, f = female
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR848BXL/>\
download.file("https://www.encodeproject.org/files/ENCFF836HMH/@@download/ENCFF836HMH.bed.gz",
destfile = "data/H3K4me1peak_yf") #y = young, f = female, 46
# H3K9me3: <https://www.encodeproject.org/experiments/ENCSR433LHD/>\
download.file("https://www.encodeproject.org/files/ENCFF144UHV/@@download/ENCFF144UHV.bed.gz",
destfile = "data/H3K9me3peak_yf") #y = young, f = female, 46
# H3K27ac: <https://www.encodeproject.org/experiments/ENCSR863BVD/>\
download.file("https://www.encodeproject.org/files/ENCFF214YFB/@@download/ENCFF214YFB.bed.gz",
destfile = "data/H3K27acpeak_yf") #y = young, f = female, 46
# young female 46y
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR848BXL/>\
download.file("https://www.encodeproject.org/files/ENCFF708RPB/@@download/ENCFF708RPB.bam",
destfile = "data/H3K4me1bam_yf") #y = young, f = female
H3K4me1bam_yf <- BamFile("data/H3K4me1bam_yf")
indexBam(H3K4me1bam_yf)
# RNA-seq data: <https://www.encodeproject.org/experiments/ENCSR837VMK/>\
download.file("https://www.encodeproject.org/files/ENCFF440HGB/@@download/ENCFF440HGB.tsv",
destfile = "data/RNA_yf") #y = young, f = female (y)
RNA_yf <- read.table("data/RNA_yf", header = TRUE)
# CTCF: <https://www.encodeproject.org/experiments/ENCSR565HBN/>\
download.file("https://www.encodeproject.org/files/ENCFF794LOO/@@download/ENCFF794LOO.bed.gz",
destfile = "data/CTCF_yf") #y = young, f = female (46y)
CTCF_yf <- read.table("data/CTCF_yf", header = FALSE)
CTCF_yf <- GRanges(seqnames = CTCF_yf$V1,
ranges = IRanges(start = CTCF_yf$V2, end = CTCF_yf$V3))
options(timeout = 6000)
# CTCF .bw: <https://www.encodeproject.org/experiments/ENCSR565HBN/>\
download.file("https://www.encodeproject.org/files/ENCFF689OGE/@@download/ENCFF689OGE.bigWig",
destfile = "data/CTCFbw_yf") #y = young, f = female (46y)
CTCFbw_yf <- read.table("data/CTCFbw_yf", header = FALSE)
download.file("https://www.encodeproject.org/files/ENCFF689OGE/@@download/ENCFF689OGE.bigWig",
destfile = "CTCFbw_yf") #y = young, f = female (46y)
CTCFbw_yf <- read.table("CTCFbw_yf", header = FALSE)
# CTCF.bed (46y): <https://www.encodeproject.org/experiments/ENCSR565HBN/>\
download.file("https://www.encodeproject.org/files/ENCFF794LOO/@@download/ENCFF794LOO.bed.gz",
destfile = "CTCFbed_yf") #y = young, f = female
CTCFbed_yf <- read.table("CTCFbed_yf", header = FALSE)
# Select the first three columns
CTCFbed_yf_subset <- CTCFbed_yf[, 1:3]
# Write the selected columns to a new file
write.table(CTCFbed_yf_subset, file = "CTCFbed_yf_subset.bed", sep = "\t",
row.names = FALSE, col.names = FALSE, quote = FALSE)
CTCFbed_yf <- GRanges(seqnames = CTCFbed_yf$V1,
ranges = IRanges(start = CTCFbed_yf$V2, end = CTCFbed_yf$V3))
The data for the clustering and visualization part are separate for a better overview.
# .bed Data
## Young Male
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR111WGZ/>\
download.file("https://www.encodeproject.org/files/ENCFF569RXT/@@download/ENCFF569RXT.bed.gz",
destfile = "data/H3K4me1peak_ym_CV") #y = young, m = male
# H3K27ac 34y: <https://www.encodeproject.org/experiments/ENCSR150QXE/>\
download.file("https://www.encodeproject.org/files/ENCFF059ERB/@@download/ENCFF059ERB.bed.gz",
destfile = "H3K27acpeak_ym_CV") #y = young, m = male
#________________________________
## Young Female
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR848BXL/>\
download.file("https://www.encodeproject.org/files/ENCFF836HMH/@@download/ENCFF836HMH.bed.gz",
destfile = "H3K4me1peak_yf_CV") #y = young, f = female, 46
# H3K27ac: <https://www.encodeproject.org/experiments/ENCSR863BVD/>\
download.file("https://www.encodeproject.org/files/ENCFF214YFB/@@download/ENCFF214YFB.bed.gz",
destfile = "H3K27acpeak_yf_CV") #y = young, f = female, 46
#_______________________________________________________________________________
# .bw Data
## Young Male
options(timeout = 6000)
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR111WGZ/>\
download.file("https://www.encodeproject.org/files/ENCFF202DON/@@download/ENCFF202DON.bigWig",
destfile = "H3K4me1BW_ym.bigWig") #y = young, m = male
# H3K27ac: <https://www.encodeproject.org/experiments/ENCSR150QXE/>\
download.file("https://www.encodeproject.org/files/ENCFF326LWU/@@download/ENCFF326LWU.bigWig",
destfile = "H3K27acBW_ym.bigWig") #y = young, m = male
## Young Female
# H3K4me1: <https://www.encodeproject.org/experiments/ENCSR848BXL/>\
download.file("https://www.encodeproject.org/files/ENCFF213UYA/@@download/ENCFF213UYA.bigWig",
destfile = "H3K4me1BW_yf.bigWig") #y = young, f = female, 46
# H3K27ac: <https://www.encodeproject.org/experiments/ENCSR863BVD/>\
download.file("https://www.encodeproject.org/files/ENCFF094PUR/@@download/ENCFF094PUR.bigWig",
destfile = "H3K27acBW_yf.bigWig") #y = young, f = female, 46
In this first section we compare peak data between age groups (young vs. old) and between sexes (male vs. female) to identify differential binding patterns. We start with the prevalence of H3K9me3 peaks in regions associated with cardiac muscle contraction genes. It is expected to be higher in older individuals leading to repressive chromatin states and potentially reduced expression of these genes. This differential repressive binding might contribute to the age-related cardiac functional decline.
H3K9me3 is a histone modification that can lead to a more tight binding of the DNA to the histone complex. This can be translated into the formation of heterochromatin. During aging, heterochromatin overall decreases. However, so called ‘senescence-associated heterochromatin foci (SAHF)’ increase in order to silence genes that promote cell division (Lee et al. 2020). Similarly to that we want to see whether regions that are associated with cardiac muscle contraction genes are also silenced by H3K9me3. If that is the case, people of older age should have more H3K9me in those regions contributing to the higher risk of heart diseases in elderly people.
Using the Gene Ontology Browser we could identify our genes of interest - the genes for cardiac muscle contraction and other heart-related processes (“Adult Heart Development Gene Ontology Term (GO:0007512),” n.d.).
# Get genome for GRanges object of cardiac genes later
ah <- AnnotationHub()
q <- AnnotationHub::query(ah, c("ENSEMBL", "Homo sapiens", "EnsDb", "GRCh38"))
# apparently hg38 and GRCh38 are the same thing
# q
ensdb <- ah[["AH116291"]]
genes <- genes(ensdb)
seqlevelsStyle(genes) <- "UCSC"
# Use biomaRt to find genes associated with cardiac muscle contraction
mart <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
# Query genes associated with cardiac muscle contraction
cardiac_genes <- getBM(
attributes = c("ensembl_gene_id", "external_gene_name", "description"),
filters = "go",
values = c("GO:0060048", "GO:0003015", "GO:0007507"), #GO term for cardiac
# muscle contraction @GOheartdevelopment: 0060048, heart processes: 0003015,
# general heart development: 0007507
mart = mart)
# Extract the Ensembl gene IDs
ensembl_gene_ids <- cardiac_genes$ensembl_gene_id
# Find the corresponding genomic ranges in the EnsDb database
cardiac_granges <- genes[genes$gene_id %in% ensembl_gene_ids]
As a next step, we overlap these genes with the peaks of the histone modification H3K9me3. That will give us an indication about their potential chromatin state.
# load the data
H3K9me3peak_ym <- read.table("data/H3K9me3peak_ym", header = FALSE)
H3K9me3peak_ym <- GRanges(seqnames = H3K9me3peak_ym$V1,
ranges = IRanges(start = H3K9me3peak_ym$V2, end = H3K9me3peak_ym$V3))
seqlevelsStyle(H3K9me3peak_ym) <- "UCSC"
H3K9me3peak_yf <- read.table("data/H3K9me3peak_yf", header = FALSE)
H3K9me3peak_yf <- GRanges(seqnames = H3K9me3peak_yf$V1,
ranges = IRanges(start = H3K9me3peak_yf$V2, end = H3K9me3peak_yf$V3))
H3K9me3peak_om <- read.table("data/H3K9me3peak_om", header = FALSE)
H3K9me3peak_om <- GRanges(seqnames = H3K9me3peak_om$V1,
ranges = IRanges(start = H3K9me3peak_om$V2, end = H3K9me3peak_om$V3))
H3K9me3peak_of <- read.table("data/H3K9me3peak_of", header = FALSE)
H3K9me3peak_of <- GRanges(seqnames = H3K9me3peak_of$V1,
ranges = IRanges(start = H3K9me3peak_of$V2, end = H3K9me3peak_of$V3))
# young male
length(cardiac_granges)
## [1] 304
overlaps_heart_H3K9me3_ym <- sum(overlapsAny(cardiac_granges, H3K9me3peak_ym))
# young female
overlaps_heart_H3K9me3_yf <- sum(overlapsAny(cardiac_granges, H3K9me3peak_yf))
# old male
overlaps_heart_H3K9me3_om <- sum(overlapsAny(cardiac_granges, H3K9me3peak_om))
# old female
overlaps_heart_H3K9me3_of <- sum(overlapsAny(cardiac_granges, H3K9me3peak_of))
# Summary table
summary_df2 <- data.frame(
Age = c("Young", "Young", "Old", "Old"),
Gender = c("Male", "Female", "Male", "Female"),
Total_Genes = rep(length(cardiac_granges), 4),
Overlapping_Genes = c(overlaps_heart_H3K9me3_ym, overlaps_heart_H3K9me3_yf, overlaps_heart_H3K9me3_om, overlaps_heart_H3K9me3_of)
)
# Plotting the overlaps
ggplot(summary_df2, aes(x = Age, y = Overlapping_Genes, fill = Gender)) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_text(aes(label = Overlapping_Genes), vjust = -0.5, position = position_dodge(0.9)) +
labs(x = "Age Group",
y = "Number of Overlapping Genes",) +
theme_minimal()
Figure 1: Cardiac Genes bound by H3K9me3
Focusing on age differences, a notable decrease of H3K9me3 in men with age can be seen in Figure 1. This observation aligns with the previously mentioned overall loss of heterochromatin during aging. It could indicate an increase in transcription of these cardiac genes. That, however, proves our hypothesis wrong and suggests a more complex interplay between the histone modifications and muscle contraction.
The largest difference can be seen in the young age group between males and females. Apparently, 69 out of 304 cardiac genes are bound by the histone modification H3K9me3 in young men. That is quite a big difference compared to the 8 genes in females, thereby highlighting differences in sex. Considering that young men have a higher risk of heart diseases than young women, our findings could be a contributing factor (“Gender Differences in Cardiovascular Disease” 2019). These initial findings lead us to further investigate how different histone modifications might influence these heart related regions. Specifically, which biological processes are affected by them.
We hypothesize that the Histone modifications H3K4me1 and H3K27ac exhibit distinct clustering patterns in cardiac genes. Specifically, we propose that young females show higher levels of these modifications in regions linked to cardiac muscle development and maintenance, suggesting that females may have a greater capability to maintain heart cell health.
We look at genes with enhancers enriched in both H3K4me1 and H3K27ac. While H3K4me1 is a common histone modification present at open enhancers, the co-presence of H3K27ac indicates active enhancers (spicuglia and Vanhille 2012). By assessing these domains, we obtain information about the indeed transcribed regions rather than just open regions that could potentially be transcribed.
# load Data
H3K4me1peak_ym_CV <- read.table("H3K4me1peak_ym_CV", header = FALSE)
H3K4me1peak_ym_gr <- GRanges(seqnames = H3K4me1peak_ym_CV$V1,
ranges = IRanges(start = H3K4me1peak_ym_CV$V2, end = H3K4me1peak_ym_CV$V3),
score = H3K4me1peak_ym_CV$V5)
H3K27acpeak_ym_CV <- read.table("H3K27acpeak_ym_CV", header = FALSE)
H3K27acpeak_ym_gr <- GRanges(seqnames = H3K27acpeak_ym_CV$V1,
ranges = IRanges(start = H3K27acpeak_ym_CV$V2, end = H3K27acpeak_ym_CV$V3),
score = H3K27acpeak_ym_CV$V5)
H3K4me1peak_yf_CV <- read.table("H3K4me1peak_yf_CV", header = FALSE)
H3K4me1peak_yf_gr <- GRanges(seqnames = H3K4me1peak_yf_CV$V1,
ranges = IRanges(start = H3K4me1peak_yf_CV$V2, end = H3K4me1peak_yf_CV$V3),
score = H3K4me1peak_yf_CV$V5)
H3K27acpeak_yf_CV <- read.table("H3K27acpeak_yf_CV", header = FALSE)
H3K27acpeak_yf_gr <- GRanges(seqnames = H3K27acpeak_yf_CV$V1,
ranges = IRanges(start = H3K27acpeak_yf_CV$V2, end = H3K27acpeak_yf_CV$V3),
score = H3K27acpeak_yf_CV$V5)
Again, we focus on the heart-related regions in the young male and female and overlap the data with the cardiac_granges from before.
## Young Male
overlaps_H3K4me1_ym <- findOverlaps(H3K4me1peak_ym_gr, cardiac_granges)
# Extract the overlapping ranges from gr1 and gr2
heartregion_H3K4me1_ym <- H3K4me1peak_ym_gr[unique(queryHits(overlaps_H3K4me1_ym))]
overlaps_H3K27ac_ym <- findOverlaps(H3K27acpeak_ym_gr, cardiac_granges)
heartregion_H3K27ac_ym <- H3K27acpeak_ym_gr[unique(queryHits(overlaps_H3K27ac_ym))]
## Young Female
overlaps_H3K4me1_yf <- findOverlaps(H3K4me1peak_yf_gr, cardiac_granges)
heartregion_H3K4me1_yf <- H3K4me1peak_yf_gr[unique(queryHits(overlaps_H3K4me1_yf))]
overlaps_H3K27ac_yf <- findOverlaps(H3K27acpeak_yf_gr, cardiac_granges)
heartregion_H3K27ac_yf <- H3K27acpeak_yf_gr[unique(queryHits(overlaps_H3K27ac_yf))]
We convert and save the filtered GRanges object back to .bed-files.
# Export the GRanges object to a BED file
export(heartregion_H3K4me1_ym, "heartregion_H3K4me1_ym.bed")
export(heartregion_H3K27ac_ym, "heartregion_H3K27ac_ym.bed")
export(heartregion_H3K4me1_yf, "heartregion_H3K4me1_yf.bed")
export(heartregion_H3K27ac_yf, "heartregion_H3K27ac_yf.bed")
# we first import the peaks
tracks <- list.files(pattern="bigWig$") # = binding intensities for all 3 CREBs
peaks <- list.files(pattern="heartregion")
#peaks_bed <- paste0(peaks, ".bed") # Add the ".bed" suffix to each filename
# we first import the peaks
peaks <- lapply(peaks, rtracklayer::import.bed)
# we'll focus on the high-quality peaks
peaks <- lapply(peaks, FUN=function(x) x[x$score>100])
# we get the union of non-redundant regions
regions <- reduce(unlist(GRangesList(peaks)))
Then we plot the activities of different histone modifications between Sexes
ese <- signal2Matrix(tracks, regions, extend=2000)
ese2 <- ese[1:1000,]
plotEnrichedHeatmaps(ese2, cluster_rows = TRUE, show_row_dend=TRUE )
Figure 2: Activities of Different Histone Modifications Between Sexes
H3K27ac shows higher enrichment in young females, while H3K4me1 shows higher enrichment in young males in the analyzed regions. This could reflect underlying biological differences in chromatin states and gene regulation between the sexes.
cl2 <- clusterSignalMatrices(ese, k=2:10)
ggplot(cl2$varExplained, aes(k, varExplained)) + geom_line()
Figure 3: Elbow-Plot
The elbow plot helps balancing between too few clusters (oversimplifying the data) and too many clusters (overfitting and complicating the interpretation). Here the pivoting point could be set on k=3, explaining roughly 85 % of the variance. This suggests that the clustering effectively reflects significant patterns or relationships in the data.
set.seed(123) # to ensure that it gives the same results everytime
cl <- clusterSignalMatrices(ese, k=3) # dividing data into k=4 clusters
rowData(ese)$cluster <- cl
mycolors <- c("1"="purple", "2"="blue", "3"="orange")
plotEnrichedHeatmaps(ese, row_split="cluster", mean_color=mycolors,
colors=c("white","darkred"))
Figure 4: Clustered Heatmap of Histone Modifications
Plotting just the averages:
d <- meltSignals(ese, splitBy=cl)
ggplot(d, aes(position, mean, colour=sample)) + geom_line() + facet_wrap(~split)
Figure 5: Clustered Averages of Histone Modifications
Cluster 2 shows the strongest signal for H3K27ac in young females. It also shows some activity in young males but considerably less so. H3K4me1 seems quite inactive in either group except for a small peak in young males shown in cluster 1.
plotEnrichedHeatmaps(ese, row_split = cl, scale_rows = "global")
Figure 6: Clustered Heatmap of Histone Modifications - globally scaled
In Figure 6 the peak of H3K4me1 in young males appears more pronounced and sharp while the signal of H3K27ac in young females seems strong but rather broad.
To understand the biological significance of the regions in each cluster we identify which biological processes, molecular functions, cellular components, or pathways are overrepresented among cardiac genes. This can provide insights into the functional roles and regulatory mechanisms of the histone modifications and help us with investigating our hypothesis. We focus on Clusters 1 and 2 to analyze the peaks of H3K4me1 in young males as well as young females.
# we first split the regions by cluster:
split_regions <- split(rowRanges(ese), rowData(ese)$cluster)
res1 <- great(split_regions[["1"]], gene_sets="GO:BP", tss_source="hg38",
background=regions, cores=2)
bp1 <- getEnrichmentTables(res1)
ggplot(head(bp1,15), aes(fold_enrichment, reorder(description, p_adjust),
size=observed_region_hits, color=-log10(p_adjust))) +
geom_point() + scale_color_viridis_c()
Figure 7: Enrichment Analysis - Cluster 1
The enrichment analysis for Cluster 1 reveals that regions in this cluster are mostly associated with synaptic function, sensory perception, muscle differentiation, ion transport, and immune system differentiation. Considering that in Cluster 1 we previously observed a small but sharp peak for H3K4me1 in young males and also small but broad peaks for all other groups, this analysis does not allow for much interpretation. Further limitations of this enrichment analysis are generally high p-values, which leave us with a quite low statistical significance.
res2 <- great(split_regions[["2"]], gene_sets="GO:BP", tss_source="hg38",
background=regions, cores=2)
bp2 <- getEnrichmentTables(res2)
ggplot(head(bp2,15), aes(fold_enrichment, reorder(description, p_adjust),
size=observed_region_hits, color=-log10(p_adjust))) +
geom_point() + scale_color_viridis_c()
Figure 8: Enrichment Analysis - Cluster 2
The enrichment analysis for Cluster 2 reveals associations with key biological processes related to cell cycle regulation, DNA damage response, autophagy, and immune responses. These regions play essential roles in ensuring proper cell division, maintaining genomic integrity, managing cellular stress, and activating immune cells. Taking into account that Cluster 2 showed a remarkably bigger peak of H3K27ac in females compared to males, we could deduce that females potentially have a greater capacity to maintain the cells of the left ventricular healthy.
Building upon our previous findings of the differential binding patterns and chromatin state changes associated with histone modifications, we now focus on the differential peaks of H3K4me1 in left ventricular (LV) tissues from both young and old individuals. Our initial analyses have highlighted significant differences in histone modification patterns related to age and sex, suggesting complex regulatory mechanisms at play.
Given these observations, we hypothesize that the differential peaks of H3K4me1 are particularly associated with genes involved in oxidative stress response and energy metabolism. These processes are essential for maintaining cellular health and function, particularly in the heart, which has high energy demands and is susceptible to oxidative damage. To test this hypothesis, we performed a comprehensive analysis including MA plotting, gene expression profiling, heatmap visualization, and Gene Ontology (GO) enrichment analysis.
# Preparing the genome sequence file (here just using a subset)
genome <- BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38
seqlevelsStyle(genome) <- "UCSC"
motifs <- MotifDb::query(MotifDb, c("HOCOMOCOv11-core", "Hsapiens"))
motifs <- do.call(TFBSTools::PWMatrixList, setNames(
universalmotif::convert_motifs(motifs, class="TFBSTools-PWMatrix"),
mcols(motifs)$geneSymbol))
# load data
H3K4me1peak_om <- rtracklayer::import("data/H3K4me1peak_om.bed.gz",
format = "narrowPeak")
H3K4me1peak_ym <- rtracklayer::import("data/H3K4me1peak_ym.bed.gz",
format = "narrowPeak")
H3K4me1peak_of <- rtracklayer::import("data/H3K4me1peak_of.bed.gz",
format = "narrowPeak")
H3K4me1peak_yf <- rtracklayer::import("data/H3K4me1peak_yf.bed.gz",
format = "narrowPeak")
# Get a list of BAM files
bams <- c("data/H3K4me1bam_om", "data/H3K4me1bam_ym", "data/H3K4me1bam_of",
"data/H3K4me1bam_yf")
names(bams) <- c("OldMale", "YoungMale", "OldFemale", "YoungFemale")
# Combine all peaks into a single GRanges object
all_peaks <- c(H3K4me1peak_om, H3K4me1peak_ym, H3K4me1peak_of, H3K4me1peak_yf)
#Chose the Chromosome
chromosome <- "chr1"
# Combine all peaks into a single GRanges object for the specified chromosome
all_peaks <- all_peaks[seqnames(all_peaks) == chromosome]
# Resize peaks to a standard size
all_peaks <- resize(all_peaks, width=300, fix="center")
To ensure that the data used for differential analysis is free from GC content bias, we utilized the chromVAR::addGCBias function to assess the GC content distribution across all peaks. The resulting histogram shows the frequency of peaks with varying GC content.
# Resize peaks to a standard size
all_peaks <- resize(all_peaks, width=300, fix="center")
# Get counts for the combined set of peaks
se <- chromVAR::getCounts(alignment_files = bams, peaks = all_peaks,
paired = FALSE)
# Assign conditions to samples
colData(se)$condition <- c("OldMale", "YoungMale", "OldFemale", "YoungFemale")
row.names(se) <- as.character(granges(se))
se <- chromVAR::addGCBias(se, genome=genome)
hist(rowData(se)$bias, main = "GC Bias Distribution", xlab = "GC Bias",
col = "blue")
Figure 9: GC Bias Distribution
As seen in the histogram, the GC bias distribution approximates a normal distribution, with the majority of peaks centered around a GC content of 0.5. This indicates that there is no significant skew towards GC-rich or GC-poor regions in our data. Therefore, we can confidently proceed with differential analysis, knowing that GC content bias has been effectively accounted for and is not likely to influence our results.
moi <- motifmatchr::matchMotifs(motifs, subject=se, genome=genome)
# ensure reproducibility
set.seed(1234)
# for each peak, we identify similar peaks as background
bg <- chromVAR::getBackgroundPeaks(se, niterations=1000)
# for each motif, we computed per-sample deviations relative to the background
dev <- chromVAR::computeDeviations(object = se, annotations=moi,
background_peaks=bg)
# Assign conditions to samples
colData(se)$condition <- factor(c("Old", "Young", "Old", "Young"))
row.names(se) <- as.character(granges(se))
# Differential analysis with edgeR
dds <- DGEList(counts = assay(se), group = colData(se)$condition)
dds <- calcNormFactors(dds)
# Estimate dispersion
dds <- estimateCommonDisp(dds)
dds <- estimateTagwiseDisp(dds)
# Perform exact test for differential expression
et <- as.data.frame(topTags(exactTest(dds), Inf))
# Plot MA plot for Old vs Young
plotMD(et, column=1, main="MA Plot: Old vs Young", ylim=c(-5, 5))
abline(h = c(-1, 1), col = "green")
Figure 10: MA Plot: Old vs Young
The MA plot illustrates the relationship between average expression levels and the differences in expression between Old and Young conditions. Most points are clustered between -1 and 1.5 on the x-axis, indicating that the majority of features have low to moderate expression levels. On the y-axis, the values range between -4 and 0, suggesting that most features exhibit a decrease in expression in Old tissues compared to Young tissues.
The clustering of points within these ranges indicate a general trend of decreased expression in old tissues. This supports the hypothesis that age-related epigenetic changes contribute to a decline in gene expression. Such a decrease in expression could impact cardiac function through mechanisms related to oxidative stress and metabolic pathways.
# Create a GRanges object from the peak identifiers in et
peak_ranges <- GRanges(seqnames = sub(":.*", "", rownames(et)),
ranges = IRanges(start = as.numeric(sub(".*:(.*)-.*",
"\\1",
rownames(et))),
end = as.numeric(sub(".*-", "",
rownames(et)))))
# Annotate the peaks to gene symbols using TxDb and org.Hs.eg.db
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
peak_annotation <- annotatePeak(peak_ranges, TxDb = txdb,
annoDb = "org.Hs.eg.db")
## >> preparing features information... 2024-07-08 23:38:39
## >> identifying nearest features... 2024-07-08 23:38:40
## >> calculating distance from peak to TSS... 2024-07-08 23:38:41
## >> assigning genomic annotation... 2024-07-08 23:38:41
## >> adding gene annotation... 2024-07-08 23:39:07
## >> assigning chromosome lengths 2024-07-08 23:39:07
## >> done... 2024-07-08 23:39:07
# Extract the gene symbols from the annotation
annotated_genes <- as.data.frame(peak_annotation)$SYMBOL
# Filter out NA values
annotated_genes <- annotated_genes[!is.na(annotated_genes)]
# Ensure the annotated genes are present in the dev object
top_features <- annotated_genes[annotated_genes %in% rownames(dev)]
# Assign annotation colors
metadata(dev)$anno_colors <- list(condition = c(OldMale = "gold",
YoungMale = "orange",
OldFemale = "pink",
YoungFemale = "skyblue"))
# Generate the heatmap plot
sechm::sechm(dev, features = top_features, assayName = "z",
top_annotation = c("condition"))
Figure 11: Gene Expression Heatmap
The heatmap visualizes the z-scores for the most relevant genes across four conditions: Old Female, Old Male, Young Female, and Young Male. This analysis helps in identifying differential gene expression patterns associated with aging and sex.
MEF2D shows very low expression in Young Males and significantly higher expression in the other three categories. Similarly, FOXJ3 exhibits low expression in Young Males and higher expression in the other groups. The Young Male group displays more yellow shades, indicating upregulated genes such as MTF1, NR5A2, PBX1, USF1, RUNX3, and JUN.
In contrast, the Old Male group shows more dark tones, indicating minimal upregulation or downregulation. The female groups (Old Female and Young Female) display similar expression patterns to each other and to the Old Male group, with only slight variances. Generally, they exhibit fewer regions of increased expression compared to the Young Male group.
The greater differences observed in the Young Male group compared to the Young Female group are likely due to the varying age gaps: 39 years between the males (73-34) and 13 years between the females (59-46). The 46-year-old female likely has gene expression patterns more similar to the older group than the younger group.
To further validate the relevance of these genes, a functional enrichment analysis can be performed to determine if these genes are significantly associated with known pathways of oxidative stress response and energy metabolism.
# Perform GREAT analysis on significant differential peaks
res <- great(peak_ranges, gene_sets="GO:BP", tss_source="hg38",
background=all_peaks, cores=2)
# Retrieve the enrichment tables
bp <- getEnrichmentTables(res)
# Visualize the results
ggplot(head(bp, 15), aes(x = fold_enrichment, y = reorder(description, p_adjust),
size = observed_region_hits, color=-log10(p_adjust))) +
geom_point() +
scale_color_viridis_c() +
labs(x = "Fold Enrichment", y = "GO Biological Process",
size = "Observed Region Hits", color = "-log10(p-adjust)") +
theme(axis.text.y = element_text(size = 10))
Figure 12: Enrichment Analysis - Chromosome 1, H3K4me1
The GO enrichment analysis indicates substantial enrichment for processes such as “leukocyte differentiation,” “lymphocyte differentiation,” “positive regulation of programmed cell death,” and “positive regulation of apoptotic process.” These immune-related and apoptotic processes align with the observed gene expression differences, suggesting that H3K4me1 may regulate these functions. For instance, the higher expression of immune-related genes in groups other than Young Males matches the enriched immune processes identified.
Additionally, mitochondrial processes like “mitochondrion organization” and “mitochondrial membrane organization” are highly enriched. These processes are crucial for maintaining cellular energy balance and preventing excessive ROS generation (Murphy 2009). The upregulation of genes involved in stress response and metabolism in Young Males, such as MTF1 and NR5A2, aligns with the enriched mitochondrial processes.
These findings provide evidence that H3K4me1-mediated epigenetic modifications influence key biological processes, including immune response, apoptosis, and mitochondrial organization, across different age and sex groups. The variations in gene expression across different age groups further support our initial hypothesis, showing a significant correlation between H3K4me1 peaks and the regulation of these critical biological functions.
Our findings support the hypothesis that age-related enhancer changes, marked by differential H3K4me1 peaks, play a crucial role in the regulation of genes involved in oxidative stress response and energy metabolism. The differential expression of genes like MEF2D and FOXJ3, along with the notable enrichment of biological processes related to apoptosis and cell differentiation, underscores the impact of oxidative damage and mitochondrial dysfunction in age-related cardiac functional decline. The analysis could be improved by taking data from people with a larger age gap, thereby providing a more robust understanding of how enhancer dynamics and associated gene regulation evolve over a broader age spectrum.
Having established the impact of H3K4me1 on gene expression and biological processes, we now turn our attention to another critical histone modification, H3K9me3, and its interaction with transcription factors like CTCF.
Histone modifications, such as H3K9me3 (histone H3 lysine 9 trimethylation), play a critical role in regulating gene expression by modulating chromatin structure and accessibility. Transcription factors, like CTCF (CCCTC-binding factor), interact with these histone modifications to coordinate the repression or activation of gene expression at specific genomic regions. Co-binding of CTCF with histone modifications at promoter regions is essential for maintaining the precise regulation of cardiac genes.
We hypothesize that in aged LV tissue, the co-binding of H3K9me3 with cardiac transcription factors such as CTCF at specific promoter regions is reduced compared to young tissue. This reduction would reflect a loss of coordinated repression that could adversely affect cardiac function. Additionally, we propose that these changes in co-binding patterns may vary between male and female subjects, potentially contributing to sex-specific differences in cardiac aging.
To investigate this hypothesis, we conducted a co-binding analysis to examine the interaction between CTCF and H3K9me3 at specific genomic regions. This analysis aimed to identify differences in co-binding patterns between young and aged LV tissue and between male and female subjects. By comparing the number and distribution of co-binding sites, we sought to understand how aging and sex influence the coordinated regulation of cardiac genes.
# load CTCF data
CTCF_om <- read.table("data/CTCF_om", header = FALSE)
CTCF_om <- GRanges(seqnames = CTCF_om$V1,
ranges = IRanges(start = CTCF_om$V2, end = CTCF_om$V3))
CTCF_ym <- read.table("data/CTCF_ym", header = FALSE)
CTCF_ym <- GRanges(seqnames = CTCF_ym$V1,
ranges = IRanges(start = CTCF_ym$V2, end = CTCF_ym$V3))
CTCF_of <- read.table("data/CTCF_of", header = FALSE)
CTCF_of <- GRanges(seqnames = CTCF_of$V1,
ranges = IRanges(start = CTCF_of$V2, end = CTCF_of$V3))
CTCF_yf <- read.table("data/CTCF_yf", header = FALSE)
CTCF_yf <- GRanges(seqnames = CTCF_yf$V1,
ranges = IRanges(start = CTCF_yf$V2, end = CTCF_yf$V3))
# Overlap CTCF with H3K9me3
overlap_om_CTCF_H3K9me3 <- unique(findOverlaps(CTCF_om, H3K9me3peak_om))
overlap_ym_CTCF_H3K9me3 <- unique(findOverlaps(CTCF_ym, H3K9me3peak_ym))
overlap_of_CTCF_H3K9me3 <- unique(findOverlaps(CTCF_of, H3K9me3peak_of))
overlap_yf_CTCF_H3K9me3 <- unique(findOverlaps(CTCF_yf, H3K9me3peak_yf))
# Convert overlaps to GRanges objects
overlap_om_CTCF_H3K9me3_gr <- pintersect(CTCF_om[queryHits(overlap_om_CTCF_H3K9me3)],
H3K9me3peak_om[subjectHits
(overlap_om_CTCF_H3K9me3)])
overlap_ym_CTCF_H3K9me3_gr <- pintersect(CTCF_ym[queryHits(overlap_ym_CTCF_H3K9me3)],
H3K9me3peak_ym[subjectHits
(overlap_ym_CTCF_H3K9me3)])
overlap_of_CTCF_H3K9me3_gr <- pintersect(CTCF_of[queryHits(overlap_of_CTCF_H3K9me3)],
H3K9me3peak_of[subjectHits
(overlap_of_CTCF_H3K9me3)])
overlap_yf_CTCF_H3K9me3_gr <- pintersect(CTCF_yf[queryHits(overlap_yf_CTCF_H3K9me3)],
H3K9me3peak_yf[subjectHits
(overlap_yf_CTCF_H3K9me3)])
# Combine all overlaps into one GRanges object
all_overlaps <- c(overlap_om_CTCF_H3K9me3_gr, overlap_ym_CTCF_H3K9me3_gr,
overlap_of_CTCF_H3K9me3_gr, overlap_yf_CTCF_H3K9me3_gr)
# Count overlaps
count_om <- length(unique(queryHits(overlap_om_CTCF_H3K9me3)))
count_ym <- length(unique(queryHits(overlap_ym_CTCF_H3K9me3)))
count_of <- length(unique(queryHits(overlap_of_CTCF_H3K9me3)))
count_yf <- length(unique(queryHits(overlap_yf_CTCF_H3K9me3)))
# Statistical comparison
counts <- data.frame(
Group = c("Old Male", "Young Male", "Old Female", "Young Female"),
Count = c(count_om, count_ym, count_of, count_yf))
# Plot the results
ggplot(counts, aes(x = Group, y = Count, fill = Group)) +
geom_bar(stat = "identity") +
theme_minimal() +
labs(title = "Co-binding of CTCF with H3K9me3",
x = "Group", y = "Number of Co-binding Sites")
Figure 13: Overlaps CTCF/H3K9me3
From Figure 13, it is evident that the young male group exhibits the highest number of co-binding sites of CTCF with H3K9me3, significantly surpassing the other groups. Old males have the second-highest number of co-binding sites, while old females and young females show relatively low levels of co-binding, with young females having the least.
The observed decline in co-binding sites of CTCF with H3K9me3 in older individuals supports our hypothesis that in aged left ventricular (LV) tissue, there is a reduction in the co-binding of H3K9me3 with cardiac transcription factors at specific promoter regions. This reduction likely reflects a loss of coordinated repression, which could impact gene regulation and ultimately affect cardiac function. Additionally, the low number of co-binding sites in females compared to males highlights potential sex-specific differences in chromatin organization and gene regulation, suggesting that hormonal or genetic factors might influence these interactions differently in male and female cardiac tissue.
Then, we conducted an annotation and enrichment analysis to determine the specific genomic regions where the peaks co-bound to and to identify the biological functions influenced by these interactions. This approach allowed for the pinpointing of pathways and processes that might be affected by the co-binding of CTCF and H3K9me3, shedding light on the regulatory mechanisms at play in different age and sex groups.
# Create and annotate and enrich function
annotate_and_enrich <- function(overlap_gr) {
# Annotate peaks to genes
peakAnno <- annotatePeak(overlap_gr, tssRegion = c(-3000, 3000),
TxDb = TxDb.Hsapiens.UCSC.hg38.knownGene,
annoDb = "org.Hs.eg.db")
# Perform GO enrichment analysis
go <- enrichGO(gene = as.data.frame(peakAnno)$geneId,
OrgDb = org.Hs.eg.db,
keyType = "ENTREZID",
ont = "ALL",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
go_df <- as.data.frame(go)
# Plot results
barplot(go, showCategory = 10)}
# List of overlaps to process
overlaps_list_ym <- list(overlap_ym_CTCF_H3K9me3_gr)
# Apply the function to each overlap
lapply(overlaps_list_ym, annotate_and_enrich)
## >> preparing features information... 2024-07-08 23:39:45
## >> identifying nearest features... 2024-07-08 23:39:45
## >> calculating distance from peak to TSS... 2024-07-08 23:39:45
## >> assigning genomic annotation... 2024-07-08 23:39:45
## >> adding gene annotation... 2024-07-08 23:39:49
## >> assigning chromosome lengths 2024-07-08 23:39:49
## >> done... 2024-07-08 23:39:49
## [[1]]
Figure 14: Enrichment Analysis - Overlaps CTCF/H3K9me3, Young Males
Figure 14 shows the enriched biological processes for young males. The analysis of young males’ data reveals significant enrichment in biological processes related to cell adhesion and synapse organization. Key processes include: Homophilic cell adhesion via plasma membrane adhesion molecules (GO:0007156), cell-cell adhesion via plasma-membrane adhesion molecules (GO:0098742), synapse organization (GO:0050808) and calcium-dependent cell-cell adhesion (GO:0016339). These findings suggest that the co-binding of CTCF and H3K9me3 in young males predominantly impacts cellular adhesion mechanisms and synaptic organization, which are essential for tissue integrity and neuronal communication. This indicates a strong influence of these co-binding events on maintaining cellular structure and function in the nervous system.
# List of overlaps to process
overlaps_list_yf <- list(overlap_yf_CTCF_H3K9me3_gr)
# Apply the function to each overlap
lapply(overlaps_list_yf, annotate_and_enrich)
## >> preparing features information... 2024-07-08 23:41:08
## >> identifying nearest features... 2024-07-08 23:41:08
## >> calculating distance from peak to TSS... 2024-07-08 23:41:10
## >> assigning genomic annotation... 2024-07-08 23:41:10
## >> adding gene annotation... 2024-07-08 23:41:19
## >> assigning chromosome lengths 2024-07-08 23:41:19
## >> done... 2024-07-08 23:41:19
## [[1]]
Figure 15: Enrichment Analysis - Overlaps CTCF/H3K9me3, Young Females
Figure 15 depicts the enriched biological process for young females, highlighting the “cornified envelope” process (GO:0001533). This indicates that the co-binding of CTCF and H3K9me3 in young females may specifically affect this biological pathway. The cornified envelope is a structure in the outermost layer of the skin that is essential for its barrier function. Finding this enrichment suggests that there may be unique regulatory mechanisms involving CTCF and H3K9me3 that are pertinent to the skin or related epithelial tissues in young females. Given that the count for this process is only 1, it signifies a single gene associated with this pathway, which highlights the specificity of the observed enrichment in the young female group.
Observations on Older Patients
Older patients’ data were not singularly used for enrichment plots as they did not show any significant results. This lack of enrichment could indicate that the co-binding of CTCF with H3K9me3 in older individuals does not significantly influence the same biological processes as seen in younger individuals, or it could reflect a more general decrease in the robustness of these interactions with age.
The significant enrichment in cell adhesion and synapse-related processes in young males and the specific enrichment in the cornified envelope process in young females suggest that CTCF and H3K9me3 co-binding affects different pathways based on sex and age. These findings support the hypothesis that aging and sex influence the molecular mechanisms of gene regulation via CTCF and H3K9me3 co-binding, with young individuals showing more pronounced and varied biological process enrichment.
Having established the age- and sex-specific differences in the co-binding patterns of CTCF and H3K9me3, we now turn our attention to a more focused aspect of sex-specific gene regulation. Particularly, we investigate the role of estrogen receptor motifs in these co-binding patterns, hypothesizing that females have distinct transcription factor co-binding patterns at accessible chromatin regions influenced by estrogen.
To test this hypothesis, we conducted a comprehensive co-binding analysis to examine the interaction between CTCF and H3K9me3 at specific genomic regions. This analysis aimed to identify differences in co-binding patterns between young and aged LV tissues and between male and female subjects.
A motif analysis was conducted to examine the binding patterns of transcription factors at accessible chromatin regions. This analysis aimed to identify differences in binding patterns between young and aged individuals as well as between male and female subjects. Specifically, the focus was on estrogen receptor motifs to uncover sex-specific regulatory mechanisms. By comparing the number and distribution of these motifs, insights into how aging and sex influence gene regulation through transcription factor binding were sought, potentially affecting genomic accessibility and transcriptional activity.
#load the data
ATACpeak_om <- read.table("data/ATACpeak_om", header = FALSE)
ATACpeak_om <- GRanges(seqnames = ATACpeak_om$V1,
ranges = IRanges(start = ATACpeak_om$V2, end = ATACpeak_om$V3))
ATACpeak_ym <- read.table("data/ATACpeak_ym", header = FALSE)
ATACpeak_ym <- GRanges(seqnames = ATACpeak_ym$V1,
ranges = IRanges(start = ATACpeak_ym$V2, end = ATACpeak_ym$V3))
ATACpeak_of <- read.table("data/ATACpeak_of", header = FALSE)
ATACpeak_of <- GRanges(seqnames = ATACpeak_of$V1,
ranges = IRanges(start = ATACpeak_of$V2, end = ATACpeak_of$V3))
ATACpeak_42yf <- read.table("data/ATACpeak_42yf", header = FALSE)
ATACpeak_42yf <- GRanges(seqnames = ATACpeak_42yf$V1,
ranges = IRanges(start = ATACpeak_42yf$V2, end = ATACpeak_42yf$V3))
# Extract cobinding sites of CTCF with H3K9me3
#overlap_om_CTCF_H3K9me3 <- unique(findOverlaps(CTCF_om, H3K9me3peak_om))
#overlap_ym_CTCF_H3K9me3 <- unique(findOverlaps(CTCF_ym, H3K9me3peak_ym))
#overlap_of_CTCF_H3K9me3 <- unique(findOverlaps(CTCF_of, H3K9me3peak_of))
#overlap_yf_CTCF_H3K9me3 <- unique(findOverlaps(CTCF_yf, H3K9me3peak_yf))
# Extract overlapping regions as GRanges objects
overlap_regions_om <- pintersect(CTCF_om[queryHits(overlap_om_CTCF_H3K9me3)],
H3K9me3peak_om[subjectHits(overlap_om_CTCF_H3K9me3)])
overlap_regions_ym <- pintersect(CTCF_ym[queryHits(overlap_ym_CTCF_H3K9me3)],
H3K9me3peak_ym[subjectHits(overlap_ym_CTCF_H3K9me3)])
overlap_regions_of <- pintersect(CTCF_of[queryHits(overlap_of_CTCF_H3K9me3)],
H3K9me3peak_of[subjectHits(overlap_of_CTCF_H3K9me3)])
overlap_regions_yf <- pintersect(CTCF_yf[queryHits(overlap_yf_CTCF_H3K9me3)],
H3K9me3peak_yf[subjectHits(overlap_yf_CTCF_H3K9me3)])
# Load motifs from MotifDb
human_motifs <- MotifDb::query(MotifDb, "Hsapiens")
estrogen_motifs <- MotifDb::query(MotifDb, "ESR1")
selected_motif <- estrogen_motifs[["Hsapiens-jaspar2022-ESR1-MA0112.3"]]
motif <- convert_motifs(selected_motif, class="TFBSTools-PWMatrix")
# Find motifs in ATAC-seq peaks
motif_hits_om <- motifmatchr::matchMotifs(motif, ATACpeak_om,
genome = BSgenome.Hsapiens.UCSC.hg38,
out="positions")
motif_hits_ym <- motifmatchr::matchMotifs(motif, ATACpeak_ym,
genome = BSgenome.Hsapiens.UCSC.hg38,
out="positions")
motif_hits_of <- motifmatchr::matchMotifs(motif, ATACpeak_of,
genome = BSgenome.Hsapiens.UCSC.hg38,
out="positions")
motif_hits_yf <- motifmatchr::matchMotifs(motif, ATACpeak_42yf,
genome = BSgenome.Hsapiens.UCSC.hg38,
out="positions")
# Find motifs in ChIP-seq peaks
motif_hits_overlap_om <- motifmatchr::matchMotifs(motif, overlap_regions_om,
genome = BSgenome.Hsapiens.UCSC.hg38,
out="positions")
motif_hits_overlap_ym <- motifmatchr::matchMotifs(motif, overlap_regions_ym,
genome = BSgenome.Hsapiens.UCSC.hg38,
out="positions")
motif_hits_overlap_of <- motifmatchr::matchMotifs(motif, overlap_regions_of,
genome = BSgenome.Hsapiens.UCSC.hg38,
out="positions")
motif_hits_overlap_yf <- motifmatchr::matchMotifs(motif, overlap_regions_yf,
genome = BSgenome.Hsapiens.UCSC.hg38,
out="positions")
# Calculate overlaps
overlap_om <- sum(overlapsAny(ATACpeak_om, motif_hits_om))
overlap_ym <- sum(overlapsAny(ATACpeak_ym, motif_hits_ym))
overlap_of <- sum(overlapsAny(ATACpeak_of, motif_hits_of))
overlap_yf <- sum(overlapsAny(ATACpeak_42yf, motif_hits_yf))
overlap_cobinding_om <- sum(overlapsAny(overlap_regions_om, motif_hits_overlap_om))
overlap_cobinding_ym <- sum(overlapsAny(overlap_regions_ym, motif_hits_overlap_ym))
overlap_cobinding_of <- sum(overlapsAny(overlap_regions_of, motif_hits_overlap_of))
overlap_cobinding_yf <- sum(overlapsAny(overlap_regions_yf, motif_hits_overlap_yf))
# Create a data frame from the overlap counts ATAC
overlap_data <- data.frame(
Group = c("Older Males", "Younger Males", "Older Females", "Younger Females"),
Overlaps = c(overlap_om, overlap_ym, overlap_of, overlap_yf)
)
# Create a data frame from the overlap counts CHIP
cobinding_overlap_data <- data.frame(
Group = c("Older Males", "Younger Males", "Older Females", "Younger Females"),
Overlaps = c(overlap_cobinding_om, overlap_cobinding_ym, overlap_cobinding_of,
overlap_cobinding_yf))
# Plot the Graphs
p1 <- ggplot(overlap_data, aes(x = Group, y = Overlaps)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Motif Overlaps in ATAC-seq Peaks", x = "Group",
y = "Count of Overlaps") + theme_minimal()
print(p1)
Figure 16: ESR1 Motif Overlaps in ATAC-seq Peaks
Figure 16 shows motif overlap counts in ATAC-seq peaks across different groups. Older Females have the highest number of overlaps, indicating greater chromatin accessibility. This aligns with established knowledge that estrogen receptor activity is more pronounced in females and crucial in regulating gene expression. Younger Males follow in motif overlaps, while Older Males and Younger Females have moderate and lowest counts, respectively.
Even though these results might be counterintuitive at first, one has to consider that there are several biological and technical factors that could explain this, such as biological variation, cell type composition and hormonal differences. Estrogen levels fluctuates significantly in females depending on their menstrual cycle, which could affect chromatin accessibility (Reed and Carr 2018).
p2 <- ggplot(cobinding_overlap_data, aes(x = Group, y = Overlaps)) +
geom_bar(stat = "identity", fill = "darkorange") +
labs(title = "Motif Overlaps in Co-binding Regions (CTCF and H3K9me3)",
x = "Group", y = "Count of Overlaps") + theme_minimal()
print(p2)
Figure 17: ESR1 Motif Overlaps in Co-binding Regions (CTCF/H3K9me3)
Figure 17 highlights motif overlaps in regions where CTCF and H3K9me3 co-bind. Younger Males exhibit significantly higher overlaps compared to Older Males and both Older and Younger Females. The absence of overlaps in females is likely due to estrogen receptor activity, which prefers accessible chromatin regions over repressive ones marked by H3K9me3 (Foret et al. 2014). This sex-specific difference in transcription factor binding patterns underscores the unique regulatory mechanisms driven by estrogen in females.
Our findings indicate that there are distinct differences in transcription factor co-binding patterns between males and females, particularly involving estrogen receptor motifs. While older females exhibited the highest number of ESR1 motif overlaps in ATAC-seq peaks, suggesting greater chromatin accessibility, younger males had the highest overlaps in regions where CTCF and H3K9me3 co-bind. This suggests a significant influence of estrogen receptor activity on chromatin accessibility in females, which may prefer accessible chromatin regions over repressied ones marked by H3K9me3.
The absence of overlaps in females for the co-binding regions of CTCF and H3K9me3 highlights a potential sex-specific regulatory mechanism driven by estrogen, supporting the hypothesis that females have distinct transcription factor co-binding patterns.
This project provides a comprehensive analysis of the roles of histone modifications and transcription factors with regard to age-related and sex-specific differences. Utilizing differential peak calling, clustering and visualization, gene association, and co-binding analysis, we have gained insights into the epigenetic landscape of left ventricular tissues. These analyses underscore the epigenetic differences between young and old, as well as between males and females, across various facets.
The observed decrease of H3K9me3 with age in men might be attributed to the overall loss of heterochromatin during aging (Lee et al. 2020). This suggests that epigenetic silencing of cardiac genes may be more prevalent in young males, potentially contributing to their higher susceptibility to heart diseases.
Through clustering, we found that women might have higher activity of cell-protective processes, highlighting potential sex-specific regulatory mechanisms in maintaining cardiac health.
Gene association analysis supports the theory of age-related enhancer changes impacting gene expression. Specifically, H3K4me1 peaks revealed an enriched association with cell death processes and mitochondrial processes, differentially regulated among groups.
The co-binding analysis showed that young males have the highest number of co-binding sites of CTCF with H3K9me3, while older males show a reduction in these sites, reflecting a possible loss of coordinated repression. In females, the specific enrichment in the cornified envelope process suggests unique regulatory mechanisms influenced by estrogen receptor motifs.
Overall, this project highlights the interplay between histone modifications, transcription factors, and gene expression in cardiac tissues. Our findings emphasize the importance of considering both age and sex in understanding the molecular mechanisms underlying cardiac function and disease. Further research is needed to explore the therapeutic potential of targeting these epigenetic modifications to mitigate age-related cardiac decline and improve heart health across different demographics.
While our study provides valuable insights, it was limited by the sample size. Furthermore, our data stems from healthy individuals, therefore statements about certain disease risks are based on the general consensus of the literature stating that women are protected from heart diseases by their sex hormones and other factors. This extra layer of extrapolation further decreases the power of our findings.
This study lays a foundation for future investigations into the epigenetic regulation of cardiac genes. By advancing our understanding of the molecular basis of cardiac aging and sex-specific differences, we can pave the way for more personalized and effective interventions to promote cardiovascular health.
In this project, the help of ChatGPT 4o was employed in a critical fashion for the generation of hypotheses and code, trouble-shooting, as wells as the interpretation of results and grammatical correctness.